AITopics | episode number

Collaborating Authors

episode number

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SimuDICE: Offline Policy Optimization Through World Model Updates and DICE Estimation

Brita, Catalin E., Bongers, Stephan, Oliehoek, Frans A.

arXiv.org Artificial IntelligenceDec-9-2024

In offline reinforcement learning, deriving an effective policy from a pre-collected set of experiences is challenging due to the distribution mismatch between the target policy and the behavioral policy used to collect the data, as well as the limited sample size. Model-based reinforcement learning improves sample efficiency by generating simulated experiences using a learned dynamic model of the environment. However, these synthetic experiences often suffer from the same distribution mismatch. To address these challenges, we introduce SimuDICE, a framework that iteratively refines the initial policy derived from offline data using synthetically generated experiences from the world model. SimuDICE enhances the quality of these simulated experiences by adjusting the sampling probabilities of state-action pairs based on stationary DIstribution Correction Estimation (DICE) and the estimated confidence in the model's predictions. This approach guides policy improvement by balancing experiences similar to those frequently encountered with ones that have a distribution mismatch. Our experiments show that SimuDICE achieves performance comparable to existing algorithms while requiring fewer pre-collected experiences and planning steps, and it remains robust across varying data collection policies.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2412.06486

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Netherlands > South Holland > Delft (0.04)
North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Robust optimal well control using an adaptive multi-grid reinforcement learning framework

Dixit, Atish, ElSheikh, Ahmed H.

arXiv.org Artificial IntelligenceJul-13-2022

Reinforcement learning (RL) is a promising tool to solve robust optimal well control problems where the model parameters are highly uncertain, and the system is partially observable in practice. However, RL of robust control policies often relies on performing a large number of simulations. This could easily become computationally intractable for cases with computationally intensive simulations. To address this bottleneck, an adaptive multi-grid RL framework is introduced which is inspired by principles of geometric multi-grid methods used in iterative numerical algorithms. RL control policies are initially learned using computationally efficient low fidelity simulations using coarse grid discretization of the underlying partial differential equations (PDEs). Subsequently, the simulation fidelity is increased in an adaptive manner towards the highest fidelity simulation that correspond to finest discretization of the model domain. The proposed framework is demonstrated using a state-of-the-art, model-free policy-based RL algorithm, namely the Proximal Policy Optimisation (PPO) algorithm. Results are shown for two case studies of robust optimal well control problems which are inspired from SPE-10 model 2 benchmark case studies. Prominent gains in the computational efficiency is observed using the proposed framework saving around 60-70% of computational cost of its single fine-grid counterpart.

episode number, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2207.03253

Genre: Research Report > Promising Solution (0.34)

Industry: Energy > Oil & Gas > Upstream (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

LASSO regression using tidymodels and #TidyTuesday data for The Office

#artificialintelligenceMay-31-2021, 22:50:19 GMT

Our modeling goal here is to predict the IMDB ratings for episodes of The Office based on the other characteristics of the episodes in the #TidyTuesday dataset. There are two datasets, one with the ratings and one with information like director, writer, and which character spoke which line. The episode numbers and titles are not consistent between them, so we can use regular expressions to do a better job of matching the datasets up for joining. We are going to use several different kinds of features for modeling. First, let's find out how many times characters speak per episode.

lasso regression, modeling, tidytuesday data, (3 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.44)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.44)

Add feedback

Sequential mastery of multiple tasks: Networks naturally learn to learn

Davidson, Guy, Mozer, Michael C.

arXiv.org Machine LearningMay-28-2019

We explore the behavior of a standard convolutional neural net in a setting that introduces classification tasks sequentially and requires the net to master new tasks while preserving mastery of previously learned tasks. This setting corresponds to that which human learners face as they acquire domain expertise, for example, as an individual reads a textbook chapter-by-chapter. Through simulations involving sequences of ten related tasks, we find reason for optimism that nets will scale well as they advance from having a single skill to becoming domain experts. We observed two key phenomena. First, _forward facilitation_---the accelerated learning of task $n+1$ having learned $n$ previous tasks---grows with $n$. Second, _backward interference_---the forgetting of the $n$ previous tasks when learning task $n+1$---diminishes with $n$. Amplifying forward facilitation is the goal of research on metalearning, and attenuating backward interference is the goal of research on catastrophic forgetting. We find that both of these goals are attained simply through broader exposure to a domain.

artificial intelligence, criterion, machine learning, (17 more...)

arXiv.org Machine Learning

1905.10837

Country:

North America > United States > New York (0.04)
North America > United States > Colorado (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(3 more...)

Genre: Research Report (0.65)

Industry:

Education (0.93)
Media > Television (0.70)
Leisure & Entertainment (0.70)
Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Randomised Bayesian Least-Squares Policy Iteration

Tziortziotis, Nikolaos, Dimitrakakis, Christos, Vazirgiannis, Michalis

arXiv.org Artificial IntelligenceApr-6-2019

We introduce Bayesian least-squares policy iteration (BLSPI), an off-policy, model-free, policy iteration algorithm that uses the Bayesian least-squares temporal-difference (BLSTD) learning algorithm to evaluate policies. An online variant of BLSPI has been also proposed, called randomised BLSPI (RBLSPI), that improves its policy based on an incomplete policy evaluation step. In online setting, the exploration-exploitation dilemma should be addressed as we try to discover the optimal policy by using samples collected by ourselves. RBLSPI exploits the advantage of BLSTD to quantify our uncertainty about the value function. Inspired by Thompson sampling, RBLSPI first samples a value function from a posterior distribution over value functions, and then selects actions based on the sampled value function. The effectiveness and the exploration abilities of RBLSPI are demonstrated experimentally in several environments.

algorithm, bayesian inference, upstream oil & gas, (20 more...)

arXiv.org Artificial Intelligence

1904.03535

Country:

Europe > France (0.14)
Europe > Norway (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Energy > Oil & Gas > Upstream (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

The Concept of Criticality in Reinforcement Learning

Spielberg, Yitzhak, Azaria, Amos

arXiv.org Artificial IntelligenceOct-16-2018

Reinforcement learning methods carry a well known bias-variance trade-off in n-step algorithms for optimal control. Unfortunately, this has rarely been addressed in current research. This trade-off principle holds independent of the choice of the algorithm, such as n-step SARSA, n-step Expected SARSA or n-step Tree backup. A small n results in a large bias, while a large n leads to large variance. The literature offers no straightforward recipe for the best choice of this value. While currently all n-step algorithms use a fixed value of n over the state space we extend the framework of n-step updates by allowing each state to have its specific n. We propose a solution to this problem within the context of human aided reinforcement learning. Our approach is based on the observation that a human can learn more efficiently if she receives input regarding the criticality of a given state and thus the amount of attention she needs to invest into the learning in that state. This observation is related to the idea that each state of the MDP has a certain measure of criticality which indicates how much the choice of the action in that state influences the return. In our algorithm the RL agent utilizes the criticality measure, a function provided by a human trainer, in order to locally choose the best stepnumber n for the update of the Q function.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

1810.07254

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Basis Adaptation for Sparse Nonlinear Reinforcement Learning

Mahadevan, Sridhar (University of Massachusetts, Amherst) | Giguere, Stephen (University of Massachusetts, Amherst) | Jacek, Nicholas (University of Massachusetts, Amherst)

AAAI ConferencesJul-9-2013

This paper presents a new approach to representation discovery in reinforcement learning (RL) using basis adaptation. We introduce a general framework for basis adaptation as {\em nonlinear separable least-squares value function approximation} based on finding Frechet gradients of an error function using variable projection functionals. We then present a scalable proximal gradient-based approach for basis adaptation using the recently proposed mirror-descent framework for RL. Unlike traditional temporal-difference (TD) methods for RL, mirror descent based RL methods undertake proximal gradient updates of weights in a dual space, which is linked together with the primal space using a Legendre transform involving the gradient of a strongly convex function. Mirror descent RL can be viewed as a proximal TD algorithm using Bregman divergence as the distance generating function. We present a new class of regularized proximal-gradient based TD methods, which combine feature selection through sparse L1 regularization and basis adaptation. Experimental results are provided to illustrate and validate the approach.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

AAAI Conferences

Twenty-Seventh AAAI Conference on Artificial Intelligence

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)

Add feedback